PROTOCOL: Behavioral, information and monetary interventions to reduce energy consumption in households: A “living” systematic review

Abstract This is the protocol for a Campbell systematic review. The objectives are as follows: Our proposed systematic review and meta‐analysis will integrate the evidence available from all sources to answer the following questions: (1) to what extent can information, behavioral and monetary interventions reduce energy consumption of households in residential buildings? (average treatment effect of interventions) (2) what is the relative effectiveness of interventions? (account for heterogeneity in treatment effects across and within studies) (3) how effective are combinations of different interventions?

However, rigorous solution-oriented knowledge on how to facilitate those pathways is missing (Berrang-Ford et al., 2020;Minx et al., 2017).In particular, a systematic assessment of the available scientific evidence of is not available to understand what climate policies work, to what extent, in what context, when and for whom (Berrang-Ford et al., 2020).This living systematic review will demonstrate how scientific evidence on the effectiveness of one particular set of policy interventions, namely behavioral, information and monetary interventions in household energy demand, can be kept up-to-date to deliver rigorous solution-oriented knowledge to policymakers whenever they need it.
Finding low energy demand pathways is necessary to hedge against the risks involved in decarbonizing energy supply and is key for finding socially acceptable ways of meeting the Paris climate goals (Creutzig et al., 2022;Grubler et al., 2018;Van Vuuren et al., 2018).
There has been a surge in interest in using demand-side policies, particularly aimed at behavioral change, for reducing energy demand and the consequent emissions.To reduce energy demand of households, primary studies have experimented with using monetary incentives and non-monetary interventions, including nudging, appealing to norms, and providing easily interpretable and credible information at the point of decision-making.The relevant literature is spread across various disciplinary fields: economics, psychology, power system, engineering studies.Our proposed review and metaanalysis will integrate the evidence available from all sources to compare the effectiveness of different types of interventions (sometimes deployed simultaneously) and to understand what drives the variation in outcomes across studies.

| The intervention
We will perform a systematic review and meta-analysis of the literature on interventions in residential energy demand to reduce energy usage.
These interventions include monetary incentives that offer households a tangible financial reward for reducing energy consumption, behavioral interventions like nudging, appealing to norms, and providing easily interpretable and credible information at the point of decision-making as well as improving skills required to perform or forego behaviors.A detailed typology of the interventions along with a brief description of each type of intervention is given in Table 1.

| How the intervention might work
There is a rich theoretical literature into possible pathways that drive reduction in household energy consumption in response monetary, information or behavior based interventions.We do not aspire to provide a comprehensive review of the theory of change for each intervention.

| Why it is important to do this review
There is a well-established and fast-growing literature on monetary and non-monetary interventions in energy consumption in households, including nudging, appealing to norms, providing easily interpretable and credible information at the point of decisionmaking, and improving skills required to perform or forego behaviors.
The relevant literature is spread across various disciplinary fields: economics, psychology, power system studies.
Previous meta-analyses on this topic are intra-disciplinary (e.g., Abrahamse et al., 2005) and/or focused on a subset of interventions.
For instance, Faruqui and Sergici (2013) focus on pricing interventions, Karlin et al. (2015) on feedback and Andor and Fels (2018)   provision has made it easier to experiment with such interventions.
The new knowledge gained from these studies may lead to changes in evidence-based recommendations for policymakers and practitioners.
However, there are long lags in incorporating new evidence.To prevent such gaps in knowledge, this review will systematically and continually update the evidence using the "living systematic review" concept, hitherto mainly discussed in clinical sciences (Akl et al., 2017;Elliott et al., 2017;Simmonds et al., 2017;Thomas et al., 2017).This living systematic review will provide up-to-date evidence on the efficacy of such interventions while simultaneously preventing duplication of effort in incorporating past studies.
Methodologically, this review filles several gaps also from a methodological perspective: understanding the role of machine learning (ML) in updating reviews, resolving the statistical challenges in updating meta-analysis, and setting up the guidelines for updating policy recommendations based on a living review.

| OBJECTIVES
Our proposed systematic review and meta-analysis will integrate the evidence available from all sources to answer the following

| METHODS
We start this study by replicating the methods from Khanna et al. (2021) as it provides a rigorous and comprehensive evidence base and serves a suitable case for living evidence.We therefore adopt all the definitions and conventions from this study and add the elements required to turn it into a living systematic review and meta-analysis.
We will adhere to the MECCIR reporting standards and fill out the AMSTAR2 checklist along with the review.The relevant portions of the MECCIR standards for protocol have been filled in and are attached.
3.1 | Criteria for considering studies for this review

| Types of studies
We will include studies that conduct randomized control trials to estimate the effect of the relevant interventions.We also include studies with quasi-experimental that estimate a causal effect including difference-in-difference, IV, longitudinal studies, and so forth.We do not include studies that are purely theoretical or simulate the effect of studies using constructed data.

| Types of participants
We will include all studies that estimate the effect at the household level, or common living spaces like dormitories.

| Types of interventions
We will perform a systematic review and meta-analysis of the literature on interventions in residential energy demand to reduce energy usage.

Primary outcomes
Energy or electricity consumption of the household.

Secondary outcomes
N/A.

Duration of follow-up
We will separately code the duration of the baseline period, duration of the intervention and duration of the follow-up.However, it should be noted that from our understanding of the literature not many studies do follow-ups.

Types of settings
We will include all experimental and quasi-experimental studies conducted at the household level.We will only include studies that capture actual energy consumption behavior so we will exclude simulation studies or studies conducted in a laboratory setting that only capture intent to save.This strategy is considered optimal as there is already a large literature that captures experiments in the real world such that evidence from online or lab experiments is not needed to be included.

| Search methods for identification of studies
The starting point of our search would be the studies identified by  2).

| Electronic searches
We will search all the relevant databases: Web of Science Core Collections Citation Indexes (the topic field, which includes title, abstract, author keywords and keywords plus), Scopus (title, abstract, (2021) included papers identified through literature snowballing and we will implement the same.
Since we will comprehensively search databases with a broad search query, we are likely to retrieve a large number of potentially relevant article abstracts from the bibliographic databases (~15,000 per year).To make the identification of relevant papers tractable, we will apply a ML algorithm that uses support vector machines to rank the studies identified by the search queries in the order of relevance of their abstracts.The best-performing ML classifier will be trained on the set of previously screened documents (N = 6023) and iteratively trained on newly screened abstracts.While Khanna et al.
(2021) used an ad-hoc approach for deciding when to safely stop screening for additional studies, we will use a formal approach here to stop screening at the point when the probability of finding more relevant studies at a given recall level is minimal.This point is determined using a statistical stopping criterion that ensures 95% recall with statistical confidence (Callaghan & Müller-Hansen, 2020).
We extend the methodological frontier by using biased urn theory for the first time to capture the non-random nature of documents previously screened.The stopping procedure estimates the bias with which relevant documents are screened compared to irrelevant documents because of ML prioritization.This bias is then used to estimate the chances of observing previously seen sequences of irrelevant documents-conditional on there being enough relevant documents in the as of yet unseen documents-by chance.The statistical stopping criteria would be applied to the entire pool of abstracts (~106,000) identified previously as relevant by Khanna et al. (2021) and the abstracts identified at each update.From our experience in using ML along with stopping criteria, we expect that we will need to manually screen about 10,000 abstracts.While we are aim to trigger the stopping criteria for the baseline review and yearly reviews at the 90% recall and p value of 0.05, higher or lower recall values could be realized with regards to resource constraints for manual screening.We will compile a comprehensive review of all the available literature yearly, following the systematic stopping criteria approach (Callaghan & Müller-Hansen, 2020), whereas on the monthly basis we will screen at abstract level 10% of available search results and code relevant documents.The statistical stopping criteria will be adapted suitably for application to a living review and the techniques to do so will be detailed in the review methodology.
We will report study flow and selection using a PRISMA flowchart-adapted for living systematic reviews.Campbell systematic reviews will link versions to the protocol and prior versions through its platform since this is a consideration for living reviews publication.

| Searching other resources
We will search for gray literature on RePec (title and keywords), Policy Commons (title, summar) and Google Scholar (title).For Google Scholar, we use Publish or Perish to download the relevant search results.We split the query by intervention type, implement partial queries separately, and retrieve the first 1,000 results available for each intervention.We a similar query for searching Policy Commons and include documents from Working Papers, Conference Proceedings and Reports.Khanna et al. (2021) included papers identified through literature snowballing and we will implement the same.

| Description of methods used in primary research
The primary studies in our inclusion criteria compare the building energy consumption of the households before and after an intervention (pre-post), or across treatment-control groups, or both before and after intervention and across treatment groups (difference-in-difference, DID).The primary statistical methods used for analysis in these studies is difference of means, ordinary least squares regression, or panel regression with household/time fixed effects panel.

| Selection of studies
The inclusion decision will be made based on the extensive inclusionexclusion criteria provided in Table 3.Each study will be coded by one individual graduate student with a background in economics or (feedback OR pric* OR "time-of-use" OR "time-of-day" OR "real time" OR "peak" OR "dynamic pricing" OR "smart meter*" OR "smart grid*" OR (behavioral AND (economic* OR intervention* OR guideline*)) OR nudge* OR "choice architecture" OR norm OR norms or "normative" OR "social influence" OR "block leader" OR "public commitment" OR "social comparison" OR "social learning" OR "social modeling" OR "peer comparison" OR "peer ("energy consumption"~15 OR "electric consumption"~15 OR "electricity consumption"~15 OR "gas consumption"~15 OR "energy conservation"~15 OR "electric conservation"~15 OR "electricity conservation"~15 OR "gas conservation"~15 OR "energy efficiency"~15 OR "electric efficiency"~15 OR "electricity (Continues) psychology, so that they are trained to read the quantitative literature that is being reviewed (total number of students = 4).To achieve consistency across coders a sample of (at least five studies) will be coded by all the coders and discrepancies in the coding discussed to resolve differences.

| Data extraction and management
The studies will be coded by two graduate students, one research associate, who have a background in economics.The risk of bias assessment will primarily be done by an experienced systematic review expert.To ensure reliability, the team will start by discussing the codebook and the interpretation of the various fields.The task will use examples given in Khanna et al. (2021).For abstract-level coding, all the members will code a set of 50 abstracts and discuss any discrepancies.We will report Cohen's κ for screening at abstract level.For full text screening, all the members of the team will code a set of 10 studies that were identified to represent the diversity of study designs that we are likely to encounter and the probable issues in coding.The members will then compare the coded papers with results from other team members and discuss discrepancies.

| Assessment of risk of bias in included studies
For critical appraisal we will code for each study metrics of study quality covering aspects of internal and external validity following the risk of bias framework suggested by the Collaboration for Environmental Evidence (Collaboration for Environmental Evidence, 2022).We have adjusted the CEE framework to be applicable to the specific data set we are working with, in terms of study designs and statistical techniques implemented in the primary studies.The detailed tool is added to the codebook.The risk of bias questionnaire will be filled out by the person coding the study.To ensure uniformity across studies, 10 studies were coded by all the coders and the results compared and discussed in detail.

| Measures of treatment effect
Design elements of original studies will be captured as dummy variables for the following variables: weather controls (if a study controls for any aspect of weather, it is assigned value 1); demographic controls (if a study controls for demographic variables like age, income, composition of the family etc., it is assigned value 1); residence controls (if a study controls for the characteristics of the house like size, etc., it is assigned value 1); and randomization (assigned value 1 if households are randomly assigned between interventions).We will also include as moderator variables study design (difference-in-difference, control-treatment, or pre-post) and statistical method (panel regression, OLS regression, or difference of means tests) employed in the studies.Other moderator variables will capture the factors that are likely to affect the relationship between energy use and the treatment, for example, duration of experiment or region in which the experiment was performed.

| Unit of analysis issues
Randomization at the cluster level: we will code whether households in a given study were randomized at the cluster level (district, state, information" OR salience OR "commitment device*" OR "Pre-commitment" OR "precommitment" OR pledge OR "behavioral contract" OR "commitment contract" OR "commitment devices" OR "commitment approach*" OR "personal commitment" OR audit OR rebate OR reward OR incentives OR "goal setting" OR "home energy report" OR "in-home display" OR "information campaign"~3 OR "information provision"~3 OR "information strategies"~3 OR "information acquisition"~3 OR "information intervention"~3 OR "information system*"~3 OR "foot-in-the-door" OR "minimal justification" OR "applied game*" OR "serious game*" OR gamif* OR "dissonance" OR "goal setting" OR tariff OR "time-varying pricing") efficiency"~15 OR "gas efficiency"~15 OR "energy use"~15 OR "electric use"~15 OR "electricity use"~15 OR "gas use"~15 OR "energy demand"~15 OR "electric demand"~15 OR "electricity demand"~15 OR "gas demand"~15 OR "energy usage"~15 OR "electric usage"~15 OR "electricity usage"~15 OR "gas usage"~15 OR "price responsiveness"~15) Google Scholar (household* OR residential) (information OR feedback OR price OR incentives) ("electricity consumption" OR "energy consumption" or "energy conservation") neighborhood) or at the household level.For studies, where households were cluster randomized, we will also code if the primary study calculated the effect accounting for the effect of such clustering (whether cluster standard errors were calculated).For studies where the effect of clustering was not accounted for the standard errors are likely to be artificially reduced, we will multiply the standard error of the effect estimate (from an analysis ignoring clustering) by the square root of the design effect by making suitable assumptions for the ICC.We will also perform a sensitivity analysis to check the robustness of the results of the meta-analyses to the inclusion of the studies which do not account for clustering.
Repeated observations on participants: we will select the longest follow-up from each study.While this may induce a lack of consistency across studies, giving rise to heterogeneity, we will also code the duration of the study to capture the heterogeneity that this introduces.

| Criteria for determination of independent findings
Generally, we do not expect the studies to capture multiple outcomes.Most of the studies included in this literature are likely T A B L E 3 Inclusion/exclusion criteria used for classifying studies.to report some form of reduction in energy consumption.It could be that some studies report the reduction in energy consumption for sub-groups of population.In this case the metric reported for the whole sample would be coded and not the reductions reported for specific sub-groups.We explicitly exclude studies or observations that report reduction in energy consumption only for specific appliances or time of the day.

| Dealing with missing data
We will write to authors of publications to obtain the data missing from studies.This would especially be done for studies that report outcomes but not the corresponding statistical variance.

| Assessment of heterogeneity
We will examine effect size heterogeneity using by examining the results of the meta-analysis and report the I 2 statistic for the models fitted.
F I G U R E 2 Timeframe for baseline review.*Cut-off point for search for the baseline review.Search will continue at regular intervals for updates to the review.
F I G U R E 1 Possible comparisons in Network Meta-Analysis.

| Assessment of reporting biases
We will assess publication bias using funnel plots and Egger's tests.If required, we will correct for publication bias using PET and PEESE methods.

| Data synthesis
The studies in our study are expected to report effects in terms of relative change in energy consumption but the exact dependent variable may vary across studies.We will first standardize the effects by converting the estimates of energy reduction reported by each study to semi-partial correlation coefficients or d-based effect sizes as appropriate and then convert them to Fisher's Z (Ringquist, 2013).
We will use a random effects model to aggregate the standardized Fisher's Z from the original studies.A random effects model is appropriate when effect sizes in primary studies do not consistently converge to a central population mean (Nelson & Kennedy, 2009;Ringquist, 2013), which is typically the case in social science research and certainly the case for studies relating to energy consumption in households with heterogeneous treatment effects (Delmas et al., 2013).
We will use the most recent version of the metafor package in R (Viechtbauer, 2010) for implementing the random effects model restricted maximum likelihood (REML) estimator.To check that no single study exerts undue influence on the aggregate effect sizes measured, we will follow best practices and use three metrics for estimation of influence-Cook's distance, cov ratio and τ 2 -from the influence function in metafor package in R.This function calculates the value of these metrics for each effect size included in the analysis.
The ordinary random effects model is inappropriate when the effect sizes included are not statistically independent (Ringquist, 2013).
Effect sizes are likely to be dependent in our sample as we extract multiple effect sizes from each study.In addition, several of the studies in our set employ multiple treatments and some use data from the same underlying experiments.We will employ a multilevel F I G U R E 3 Process for updating the living systematic review.*For Scopus and Web of Science.For other databases the search and screening will be done annually.
95% within a prespecified statistical confidence interval.At this point the screening will be stopped.The newly included studies will be coded.We plan to rerun the analysis on an annual basis.The updated analysis will feed into an updated manuscript that will be submitted to Campbell Systematic Reviews for peer-review and a draft published as a pre-print on the online repository.

|
The problem, condition, or issue Policymakers only have little time left to prevent the worst impacts of climate change and limit global warming to well below two degrees (IPCC, 2022, 2023).Assessments by the Intergovernmental Panel on Climate Change (IPCC) have provided evidence about the extent of climate change and possible pathways for mitigation (IPCC, 2023).
on social comparison, commitment devices, goal setting, and labeling.Nisa et al. (2019) consider evidence from a wider range of household behaviors that are relevant for climate change mitigation but do not review interventions in energy consumption exhaustively.The metaanalysis by Delmas et al. (2013) was based on a narrower literature search and did not include studies published after 2012.A review with similar scope but with studies updated till 2019 was published by Buckley (2020).Khanna et al. (2021) provides a comprehensive meta-analysis but the literature on behavioral interventions is increasing fast, so reviews need to be constantly updated.The latest systematic review by Khanna et al. (2021) contained 360 effect sizes from 122 studies with evidence from 25 different countries published by mid-2020.Since then, considerably more evidence has emerged in this fast-growing field of study as technological advancement in metering of energy and information T A B L E 1 Detailed typology of interventions.
questions: (1) to what extent can information, behavioral and monetary interventions reduce energy consumption of households in residential buildings?(average treatment effect of interventions) (2) what is the relative effectiveness of interventions? (account for heterogeneity in treatment effects across and within studies) (3) how effective are combinations of different interventions?
Khanna et al. (2021) through their comprehensive search of the relevant literature in previous systematic reviews and meta-analyses, and bibliographic databases.Khanna et al. (2021) used a prioritized screening approach(Callaghan & Müller-Hansen, 2020) to identify relevant evidence from over 64,000 studies at the abstract level (through a mix of manual and ML approaches), whereby 934 studies were screened manually at the full-text level and 122 relevant studies were identified, which are twice as many as identified by any previous systematic review with this scope and includes all the studies identified by previous reviews.We will update this pool of relevant studies through string-based searches of bibliographic databases and gray literature since 2020, when the databases were searched byKhanna et al. (2021).The search string to be used will follow the PICOS (population, intervention, comparator, outcome, and study design) logic to target empirical studies covering one or more of such interventions and household energy consumption as the outcome variable (Table keywords), JSTOR (title, abstract), RePec (title and keywords) and the web-based academic search engine Google Scholar (title) and Policy Commons (title, summary).For Google Scholar, we use Publish or Perish to download the relevant search results.We split the query by intervention type, implement partial queries separately, and retrieve the first 1000 results available for each intervention.We a similar query for searching Policy Commons and include documents from Working Papers, Conference Proceedings, and Reports.Khanna et al.
These interventions include monetary incentives that offer households a tangible financial reward for reducing energy consumption, behavioral interventions like nudging, appealing to norms, and providing easily interpretable and credible information at the point of decision-making as well as improving skills required to perform or forego behaviors.
Khanna et al. (2021)l.(2021), we classify the studied interventions into five categories-monetary incentives, information, feedback, social norms and motivation interventions.Refer to Table 1 for details about each type of interventions.